An Extensive Evaluation of Filtering Misclassified Instances in Supervised Classification Tasks
نویسندگان
چکیده
Removing or filtering outliers and mislabeled instances prior to training a learning algorithm has been shown to increase classification accuracy. A popular approach for handling outliers and mislabeled instances is to remove any instance that is misclassified by a learning algorithm. However, an examination of which learning algorithms to use for filtering as well as their effects on multiple learning algorithms over a large set of data sets has not been done. Previous work has generally been limited due to the large computational requirements to run such an experiment, and, thus, the examination has generally been limited to learning algorithms that are computationally inexpensive and using a small number of data sets. In this paper, we examine 9 learning algorithms as filtering algorithms as well as examining the effects of filtering in the 9 chosen learning algorithms on a set of 54 data sets. In addition to using each learning algorithm individually as a filter, we also use the set of learning algorithms as an ensemble filter and use an adaptive algorithm that selects a subset of the learning algorithms for filtering for a specific task and learning algorithm. We find that for most cases, using an ensemble of learning algorithms for filtering produces the greatest increase in classification accuracy. We also compare filtering with a majority voting ensemble. The voting ensemble significantly outperforms filtering unless there are high amounts of noise present in the data set. Additionally, we find that a majority voting ensemble is robust to noise as filtering with a voting ensemble does not increase the classification accuracy of the voting
منابع مشابه
A Comparative Evaluation of Curriculum Learning with Filtering and Boosting in Supervised Classification Problems
Not all instances in a data set are equally beneficial for inferring a model of the data. Some instances (such as outliers) are detrimental to inferring a model of the data. Several machine learning techniques treat instances in a data set differently during training such as curriculum learning, filtering, and boosting. However, an automated method for determining how beneficial an instance is ...
متن کاملClassification of encrypted traffic for applications based on statistical features
Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...
متن کاملConstructing ensembles of classifiers using supervised projection methods based on misclassified instances
In this paper we propose an approach for ensemble construction based on the use of supervised projections, both linear and non-linear, to achieve both accuracy and diversity of individual classifiers. The proposed approach uses the philosophy of boosting, putting more effort on difficult instances, but instead of learning the classifier on a biased distribution of the training set, it uses misc...
متن کاملPalarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm
Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...
متن کاملOnline Passive Aggressive Active Learning and Its Applications
We investigate online active learning techniques for classification tasks in data stream mining applications. Unlike traditional learning approaches (either batch or online learning) that often require to request the class label of each incoming instance, online active learning queries only a subset of informative incoming instances to update the classification model, which aims to maximize cla...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1312.3970 شماره
صفحات -
تاریخ انتشار 2013